A novel imbalanced data classification approach using both under and over sampling
نویسندگان
چکیده
The performance of the data classification has encountered a problem when distribution is imbalanced. This fact results in classifiers tend to majority class which most instances. One popular approaches balance dataset using over and under sampling methods. paper presents novel pre-processing technique that performs both algorithms for an imbalanced dataset. proposed method uses SMOTE algorithm increase minority class. Moreover, cluster-based approach performed decrease takes into consideration new size experimental on 10 datasets show suggested better comparison previous approaches.
منابع مشابه
Borderline over-sampling for imbalanced data classification
Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline b...
متن کاملClassification of Imbalanced Data Using Synthetic Over-Sampling Techniques
of the Thesis Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques
متن کاملA Novel Approach to Handle Imbalanced Data for Classification
This paper attempts to propose a particle swarm K-means optimization (PSKO)-based granular computing (GrC) model to preprocess the skewed class distribution in order to enhance the classification accuracy for class imbalance problem. The GrC model acquires knowledge from information granules rather than from numerical data. It also processes multi-dimensional and sparse data by using singular v...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملThe Imbalanced Training Sample Problem: Under or over Sampling?
The problem of imbalanced training sets in supervised pattern recognition methods is receiving growing attention. Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. It has been observed that this situation, which arises in several practical domains, may produce an important deterioration of the classificatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Electrical Engineering and Informatics
سال: 2021
ISSN: ['2302-9285']
DOI: https://doi.org/10.11591/eei.v10i5.2785